A Study of Loop Unrolling for VLIW-based DSP Processors

نویسندگان

  • Suleyman Sair
  • David R. Kaeli
  • Waleed Meleis
چکیده

With the growing popularity of DSPs and their associated applications, cost-effective software development has become a major issue. High-level language compilers are becoming more commonplace in the DSP world. While these compilers can generate correct code for DSP architectures, there remains considerable room for performance improvements. This paper addresses issues related to DSP compilation, focusing specifically on unrolling techniques proposed for VLIW-based DSP architectures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global Trade-o between Code Size and Performance for Loop Unrolling on VLIW Architectures

Many media processors 28, 7, 14, 8, 18, 27], used for computing intensive embedded applications, are VLIW architectures that rely on the compiler to exploit Instruction Level Parallelism. Loop unrolling is generally used to expose instruction parallelism but computing the unrolling factor is very diicult as instruction cache misses and spill code can cancel the expected beneet of the transforma...

متن کامل

UFC : a Global Trade - o Strategy for Loop Unrolling for VLIWArchitectureK

In order to minimize code size overhead on VLIW ar-chitectures, compilers for embedded processors have to pay higher attention on code optimization than on compilation time. Thus, the rst demand on compiler for embedded processors consists in spending instruction memory space for optimization only if the associated performance improvement justiies it. In this paper, we propose a novel method ba...

متن کامل

Implementing Click IP Router Kernel on VLIW Architectures

In this work, we implemented the Click IP Router Kernel in C language provided by Scott Webber et al. for two VLIW processors designed for DSP purpose, namely the Philips Trimedia TM1300 processor and Texas Instrument TMS320C6701 processor. The performance of these processors are compared with those of three other processors, ARM SA-110, HPL-PD EPIC, and Intel IXP1200 [1]. Ways of further perfo...

متن کامل

Optimization of SAD Algorithm on VLIW DSP

SAD (Sum of Absolute Difference) algorithm is heavily used in motion estimation which is computationally highly demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an efficient implementation of SAD algorithm on the VLIW processor is essential. SAD algorithm is programmed as a nested loop with a conditional branch. In VLIW pro...

متن کامل

Assembly Code Conversion of Software-Pipelined Loop between two VLIW DSP Processors

In order to fully utilize the instruction level parallelism of VLIW DSP processors, DSP programs have to be optimized by software pipelining. Software pipelining has been studied for many years and widely implemented in optimizing compilers. However, due to the rearrangement of the original instructions, it is often very difficult to re-use or port the code of a software-pipelined loop to other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998